The OLAC Metadata Set and Controlled Vocabularies

نویسندگان

  • Steven Bird
  • Gary Simons
چکیده

As language data and associated technologies proliferate and as the language resources community rapidly expands, it has become difficult to locate and reuse existing resources. Are there any lexical resources for such-and-such a language? What tool can work with transcripts in this particular format? What is a good format to use for linguistic data of this type? Questions like these dominate many mailing lists, since web search engines are an unreliable way to find language resources. This paper describes a new digital infrastructure for language resource discovery, based on the Open Archives Initiative, and called OLAC – the Open Language Archives Community. The OLAC Metadata Set and the associated controlled vocabularies facilitate consistent description and focussed searching. We report progress on the metadata set and controlled vocabularies, describing current issues and soliciting input from the language resources community.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Open Language Archives Community and Asian Language Resources

The Open Language Archives Community (OLAC) is a new project to build a worldwide system of federated language archives based on the Open Archives Initiative and the Dublin Core Metadata Initiative. This paper aims to disseminate the OLAC vision to the language resources community in Asia, and to show language technologists and linguists how they can document their tools and data in such a way ...

متن کامل

Vocabulary Conversion : Performance with Controlled and Uncontrolled Terms and Tags Technical

Controlled and uncontrolled indexing terminology and metadata may be converted from one to another. Decision criteria are developed that can be used to determine which terms should be assigned when converting vocabularies. Methods are developed for computing the parameters of these systems, as well as means for estimating the parameters when given limited information. These conversion technique...

متن کامل

Find and Combine Vocabularies to Design Metadata Application Profiles using Schema Registries and LOD Resources

A metadata schema which defines constraints about metadata records is a fundamental resource for metadata interoperability. Building interoperable metadata schemas has been a main topic of the Dublin Core since its early days. It is important to make use of existing metadata schemas to develop a new schema in order to minimize newly defined metadata vocabularies, which is how DCMI has developed...

متن کامل

Advanced Search Technologies for Unfamiliar Metadata

Searching of databases (textual or numeric) is likely to be effective and efficient only if the user is familiar with the classification, categorizing, and indexing schemes (metadata vocabularies) being searched. Therefore, it is obviously beneficial to provide a bridge between the user’s ordinary language and the metadata vocabularies of the unfamiliar database in order to compensate for abbre...

متن کامل

Extending Dublin Core Metadata to Support the Description and Discovery of Language Resources

As language data and associated technologies proliferate and as the language resources community expands, it is becoming increasingly difficult to locate and reuse existing resources. Are there any lexical resources for such-and-such a language? What tool works with transcripts in this particular format? What is a good format to use for linguistic data of this type? Questions like these dominat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره cs.CL/0105030  شماره 

صفحات  -

تاریخ انتشار 2001